accumulative metric
Supplemental Material for " Model Selection for Production System via Automated Online Experiments " A Experiment Details
We use the default setting of BO in GPyOpt, where the surrogate model is a Gaussian process (GP) regression model with a Gaussian noise distribution and a Mátern 5/2 kernel. However, for the recommender system experiment, there are no natural representations for the candidate models. Off-policy evaluation (OPE) methods can provide an estimate of the accumulative metric. IS-g and DR-g suffer from the fact that there is no exploration mechanism. We simulate the "online" deployment scenario as follows: a multi-class classifier is given a set of inputs; for each input, the classifier returns a prediction of the label and only a binary immediate feedback about whether the predicted class is correct is available.
- North America > United States (0.04)
- North America > Canada (0.04)
- Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.93)
- Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.68)
production systems (MSPS), in which model selection is achieved by sequentially deploying a list of candidate models
We thank the reviewers for the in-depth reviews. We will first answer the comments shared by multiple reviewers and then answer the individual comments. We will add additional details about sparse GP, VI and binary observation in suppl. A/B tests for a few weeks and then select the best model, which is the common scenario in industry. Model selection for a time sensitive system is an interesting and open research question for future work.
- North America > United States (0.04)
- North America > Canada (0.04)
- Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.97)
- Information Technology > Data Science > Data Mining (0.93)
- Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.93)
- Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.68)
Review for NeurIPS paper: Model Selection for Production System via Automated Online Experiments
Summary and Contributions: The paper proposes a model selection algorithm called Model Selection with Automated Online Experiments (AOE) that is designed for use in production systems. In the problem statement, it is stated that the goal of the model selection problem is to select the model from a set of candidate models that maximises a metric of interest. It is assumed that the metric of interest can be expressed as the average immediate feedback from each of a model's predictions. AOE uses both historical log data and data collected from a small budget of online experiments to inform the choice of model. A distribution for the accumulative metric, or expected immediate feedback, is derived.
Model Selection for Production System via Automated Online Experiments
Dai, Zhenwen, Chandar, Praveen, Fazelnia, Ghazal, Carterette, Ben, Lalmas-Roelleke, Mounia
A challenge that machine learning practitioners in the industry face is the task of selecting the best model to deploy in production. As a model is often an intermediate component of a production system, online controlled experiments such as A/B tests yield the most reliable estimation of the effectiveness of the whole system, but can only compare two or a few models due to budget constraints. We propose an automated online experimentation mechanism that can efficiently perform model selection from a large pool of models with a small number of online experiments. We derive the probability distribution of the metric of interest that contains the model uncertainty from our Bayesian surrogate model trained using historical logs. Our method efficiently identifies the best model by sequentially selecting and deploying a list of models from the candidate set that balance exploration-exploitation. Using simulations based on real data, we demonstrate the effectiveness of our method on two different tasks.
- North America > United States (0.14)
- North America > Canada (0.14)
- Research Report > Experimental Study (0.68)
- Research Report > Strength High (0.54)
- Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.93)
- Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.67)